January 5, 2019

The Assignment

Create a web page presentation using R Markdown that features a plot created with Plotly. Host your webpage on either GitHub Pages, RPubs, or NeoCities. Your webpage must contain the date that you created the document, and it must contain a plot created with Plotly.

The Problem

There are a number of factors that influence the rate at which Cub Scouts, the Boy Scouts of America program for 1st through 4th Grade children, earn achievements that acknowledge the learning activities that they have completed. In an attempt to build a model to guide adult leaders at the district level. a data set was assembled with the following elements:

  • % Trained Leaders (Den Leaders, Cubmasters)
  • % Youth Retention (Youth in the pack more than 1 year)
  • % Religous Units (Percent of units in the district sponsored by a single religious tradition)
  • Total Units (total number of Cub Scout Packs in a district)

Collecting and Processing Data

The data for the analysis was gathered by averaging 12 months of data for each parameter from 22 Boy Scouts of America districts in the Western United States. An initial linear regression produces a model with the following coefficients

## 
## Call:
## lm(formula = CS_Adv ~ Trained_Leader + Youth_Retention + Rel_Units + 
##     total_units, data = BSA.Advancement.Correlation)
## 
## Coefficients:
##     (Intercept)   Trained_Leader  Youth_Retention        Rel_Units  
##       0.1581488        0.3028774        1.0360703       -0.7358420  
##     total_units  
##       0.0002845

Collecting and Processing Data 2

In order to create an optimized model, the AIC algorithm was used to generate the best model in two iterations.

## Start:  AIC=-78.65
## CS_Adv ~ Trained_Leader + Youth_Retention + Rel_Units + total_units
## 
##                   Df Sum of Sq     RSS     AIC
## - total_units      1  0.026076 0.41734 -79.228
## - Trained_Leader   1  0.030531 0.42180 -78.994
## <none>                         0.39127 -78.647
## - Youth_Retention  1  0.131669 0.52294 -74.265
## - Rel_Units        1  0.159741 0.55101 -73.115
## 
## Step:  AIC=-79.23
## CS_Adv ~ Trained_Leader + Youth_Retention + Rel_Units
## 
##                   Df Sum of Sq     RSS     AIC
## <none>                         0.41734 -79.228
## - Trained_Leader   1  0.042844 0.46019 -79.078
## + total_units      1  0.026076 0.39127 -78.647
## - Rel_Units        1  0.151788 0.56913 -74.403
## - Youth_Retention  1  0.179111 0.59645 -73.372

Best Model v. Base Model

A base model is created for comparison. ANOVA was used to compare the models.

base_model.cs <- lm(CS_Adv ~ Trained_Leader, data = BSA.Advancement.Correlation)
anova(base_model.cs, best_model)
## Analysis of Variance Table
## 
## Model 1: CS_Adv ~ Trained_Leader
## Model 2: CS_Adv ~ Trained_Leader + Youth_Retention + Rel_Units
##   Res.Df     RSS Df Sum of Sq      F  Pr(>F)  
## 1     20 0.62124                              
## 2     18 0.41734  2   0.20389 4.3969 0.02787 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The variance is significant at the 99% level

Slide with Plot

This page displays 3-D graphics that may take a few seconds to load.

If you get a "WebGL is not supported by your browser……" error on this page, you need to enable 3-d graphics in the settings of your browser.